Compound specific isotope analysis of individual amino acids (CSIA-AA) is a new and exciting tool in the study of trophic ecology and animal nutrition. CSIA-AA are not constrained by the same spatiotemporal variability and fractionation effects observed in stable isotope analyses (SIA) of bulk tissues (carbons + lipids + proteins). Therefore, CSIA-AA represent a sea change in our ability to understand food webs and trophic interactions.
Certain amino acids can only be produced by primary producers (bacteria, algae, plants, fungi). These de novo synthesized amino acids, termed ‘essential amino acids (AAESS),’ must be gained by animals through feeding/dietary means. Essential amino acids are passed from producers to consumers with limited fractionation or transamination, making them an ideal tool to examine the sources of carbon in consumers. Moreover, essential amino acids differ in their isotope signatures due to the biochemical pathways found in different groups of producers (i.e., fungi, bacteria, microalgae, C3/C4/CAM plants), allowing producers signatures to be mapped with high resolution. By using multiple essential amino acids as a multivariate trait, an essential amino acid fingerprint can be used to identify the source of amino acids in consumers and the contributions and identify of producers in diets.
Terrestrial energy channels (collevtively, “allochthony”) in aquatic systems is a resource for aquatic consumers and ecosystems, although its significance has been debated. Zooplankton prefer (and require) the high fatty acid content of microalgae (or autochthonous energy channels), which is a vital nutritional source.Terrestrial materials entering lakes can leads to browning and a decrease in microalgae productivity as light is attenuated. In this case, allochthonous energy channels may limits primary production but can be an important alternative resource for consumers that are able to utilize the abundant (although low quality) nutritional source.
Here, we used CSIA-AA to ask whether zooplankton in high elevation alpine lakes of the Sierra Nevada Mountains obtain their essential amino acids from in-water producers (particulate organic matter of microalgae origin) or from terrestrial C3 plants (sedges, pines, broadleaf deciduous trees). We sampled 7 lakes and 1 dystropic pond across an elevation gradient (2500-3200m) and measured environmental conditions (temperature, pH, DOC, TN, TP, chlorophyll-a), measuring AA-carbon isotope values in producers and plankton consumers picked to individual species as well as a community size-fraction (>350um).
The map here will shows sampling locations in the Sierra Nevada
Mountains. Note you must have a Google API to generate maps in
get_googlmaps.
# load data
SNL.env<-read.csv("data/SNL.envdata.csv")
API<-read.csv("data/API_key.csv")
API.key<-API[1,1]
#quick map
# ggplot(SNL.env, aes(x = longitude, y = latitude)) + coord_quickmap() + geom_point()
######## using ggmap
register_google(key=API.key)
########
#California Map
########
CA.map<-get_googlemap(center=c(-121, y = 37), zoom = 6, source="google", maptype="hybrid",
style=c(feature="poi",element="labels",visibility="off"))
# "poi" is to remove places of interest from map, other options avaialble too...
# no cities, roads, just "CA"...
# style = 'feature:road|element:all|visibility:simplified&style=feature:administrative.locality|element:labels|visibility:off')
# https://developers.google.com/maps/documentation/maps-static/styling#features
CA.map_for_man <- ggmap(CA.map) +
geom_point(aes(x = -119, y = 38), pch=23,colour="black",fill="mediumseagreen", size = 3, stroke=0.5) +
xlab("longitude") + ylab("latitude") +
theme(text = element_text(size=6),
plot.margin = unit(c(0.2, 0.5, 0.2, 0.2), "cm"))
#ggsn::scalebar(x.min=-124, x.max=-123, y.min=32.5, y.max=33, dist=2000, dist_unit="km",transform=TRUE,st.bottom=FALSE, st.size=2, box.fill=c("gray50", "white"), model="WGS84",st.color="white", border.size=0.5)
##########
# site map
##########
SNL.rev=c(x=-119.25, y = 37.75)
map.SNL<-get_map(SNL.rev,
zoom=9,
scale = 2,
maptype= "satellite",
source="google", extent= "device", legend="topright")
## sites lat long
lat.long<-SNL.env %>%
dplyr::select(Lake, latitude, longitude, elevation..m)
SNL.sites<-
ggmap(map.SNL)+
geom_point(aes(x=longitude, y=latitude), data=lat.long, alpha=0.8, color="dodgerblue", size=4)+
geom_point(aes(x=longitude, y=latitude), data=lat.long, alpha=1, color="black", size=4, pch=21)+
labs(x="longitude", y="latitude") +
theme(text = element_text(size=6),
plot.margin = unit(c(0.2, 0.5, 0.2, 0.2), "cm"))
#scale_y_continuous(limits=c(19.81785, 19.8242))+
#scale_x_continuous(limits=c(-155.3223, -155.311)) +
#annotate("text", x=-155.3187, y=19.8225, label= "KP", size=3, col="white") +
#annotate("text", x=-155.31478, y=19.8224, label= "RK", size=3, col="white") +
#ggsn::scalebar(x.min=-155.314, x.max=-155.311, y.min=19.818, y.max=19.824, dist=100,
#dist_unit="m", transform=TRUE,
#st.bottom=FALSE, st.size=2, box.fill=c("gray50", "white"), model="WGS84",st.color="white", border.size=0.5)
###
site.plots<-plot_grid(CA.map_for_man, SNL.sites,
labels=c('A', 'B'), label_size=8, hjust=-1, vjust= 6, ncol=2, nrow=1)
### export it
pdf(file= "figures/Fig1.sites_AB.pdf", height=4, width=8)
site.plots
dev.off()
Data here is from:
Symons, C. C., M. A. Schulhof, H. B. Cavalheri, and J. B. Shurin.
2019. Antagonistic effects of temperature and dissolved organic carbon
on fish growth in California mountain lakes. Oecologia 189: 231–241. doi:10.1007/s00442-018-4298-9.
In this study they sampled fish, grass, pines, plankton, and POM for carbon and nitrogen isotope analyses. The figure details that there is considerable overlap and high variance in these groups, making it difficult to determine degrees of allochthony contribution to consumers.
##### Bulk stable isotope analyses (SIA)
bulk.iso<-read.csv("data/Symons iso data/Symons_Yos_bulkSIA.csv")
#d13C vs d15N
d13C.d15N<-ggplot(bulk.iso, aes(x=d13C, y=d15N, color=type)) +
scale_color_manual(values=c("orchid", "lightgoldenrod3", "mediumseagreen", "royalblue2", "coral")) +
xlab(expression(paste(delta^{13}, C, " (\u2030, V-PDB)")))+
ylab(expression(paste(delta^{15}, N, " (\u2030, Air)")))+
coord_cartesian(xlim=c(-45, -5), ylim=c(-6, 10)) +
labs(color = "")+
geom_point() +
stat_ellipse()+
theme_classic()
ggsave("figures/FigS1.Bulk_13C.15N.pdf", encod="MacRoman", height=6, width=7)
d13C.d15N
Figure. Biplot of bulk tissue stable isotope analysis for carbon (δ13C) and nitrogen (δ15N) in fish, grasses, pines, POM, and zooplankton from montane lakes in the Eastern Sierra Nevada Mountains of California (data from Symons et al. 2019).
Load in the full, standardized data (not to be confused with mean-normalized data) and pull out the essential amino acids (AAESS), and subset by sample type. We will be excluding Lysine due to poor resolution.
SNL.d13C.aa<- read.csv("data/Sierra Nevada Lakes AA d13C Data NMX standard.csv")
#Pull out just the essential AA values (excluding Lys)
SNL.ess <- data.frame(SNL.d13C.aa$SampleID, SNL.d13C.aa$Ile13C, SNL.d13C.aa$Leu13C, SNL.d13C.aa$Phe13C,
SNL.d13C.aa$Thr13C, SNL.d13C.aa$Val13C)
#Rename columns
colnames(SNL.ess) <- c("SampleID", "Ile", "Leu", "Phe", "Thr", "Val")
SNL.ess$SampleID <- as.character(SNL.ess$SampleID)
#Subset terrestrial plants
plants <- SNL.ess[substr(SNL.ess$SampleID,1,11)=="SNL-TERPLA-",]
#Add a column for group
plants$Group <- "Plants"
#Subset POM
pom <- SNL.ess[substr(SNL.ess$SampleID,1,8)=="SNL-POM-",]
#Add a column for group
pom$Group <- "POM"
#Write csv file - primary producer d13C for just ess AA - NMX standard
prod <- rbind(plants, pom)
write.csv(prod, "output/SNL_producers_ess_AA_data.csv", row.names=FALSE)
#Subset zooplankton
zoops <- SNL.ess[substr(SNL.ess$SampleID,1,9)=="SNL-ZOOP-",]
#Pull in zooplankton metadata
zoops_meta <- read.csv("data/Zoops_Meta.csv")
zoops <- merge.data.frame(zoops_meta, zoops, by.x = "SampleID", by.y = "SampleID")
#remove TB tanks
zoops<-zoops[!(zoops$Lake=="TB-tanks"),]
#Write csv file - zooplankton d13C for just ess AA - UNM standard
write.csv(zoops, "output/SNL_zooplankton_ess_AA_data.csv", row.names=FALSE)
Arrange and format the dataframe for ggplot. We will use this data for linear model tests to determine which (if any) AAESS are different among these 3 groups. Make a boxplot of this data.
# function to bind the data frames together
bind_cols_fill <- function(df_list) {
max_rows <- map_int(df_list, nrow) %>% max()
map(df_list, function(df) {
if(nrow(df) == max_rows) return(df)
first <- names(df)[1] %>% sym()
df %>% add_row(!!first := rep(NA, max_rows - nrow(df)))
}) %>% bind_cols()
}
ESS_df.wide<-as.data.frame(bind_cols_fill(list(tibble(plants$Ile), tibble(pom$Ile), tibble(zoops$Ile),
tibble(plants$Leu), tibble(pom$Leu), tibble(zoops$Leu),
tibble(plants$Phe), tibble(pom$Phe), tibble(zoops$Phe),
tibble(plants$Thr), tibble(pom$Thr), tibble(zoops$Thr),
tibble(plants$Val), tibble(pom$Val), tibble(zoops$Val))))
# select columns that match the producers
library(dplyr)
zoop.sub<- zoops %>%
dplyr::select(SampleID, Ile, Leu, Phe, Thr, Val, Group, Type, Lake)
## with bind rows bring zoops and producers together
prod$Type<-"NA"
prod$Lake<-"NA"
# the above to "check yourself!", but the code below will set NAs by default
ESS.df.long<- bind_rows(prod, zoop.sub)
ESS.df.long$Lake[ESS.df.long$Lake == "NA"] <- "Source"
# make new level for "phy.group" based on taxonomy
ESS.df.long$phy.group<- ifelse(ESS.df.long$Type == "calanoid", "copepoda",
ifelse(ESS.df.long$Type == "calanoid_cyclopoid", "copepoda",
ifelse(ESS.df.long$Type == "Ceriodaphnia", "cladocera",
ifelse(ESS.df.long$Type == "Daphnia", "cladocera",
ifelse(ESS.df.long$Type == "Holopedium", "cladocera",
ifelse(ESS.df.long$Type == "Ceriodaphnia", "cladocera",
ifelse(ESS.df.long$Type == ">350um", ">350um",
ifelse(ESS.df.long$Group == "Plants", "Plants",
"POM"))))))))
######### figure in ggplot
# plants
Ile.plants<-as.data.frame(cbind(plants$Group, "Source", "Isoleucine", plants$Ile))
Leu.plants<-as.data.frame(cbind(plants$Group, "Source", "Leucine", plants$Leu))
Phe.plants<-as.data.frame(cbind(plants$Group, "Source", "Phenylalanine", plants$Phe))
Thr.plants<-as.data.frame(cbind(plants$Group, "Source", "Threonine", plants$Thr))
Val.plants<-as.data.frame(cbind(plants$Group, "Source", "Valine", plants$Val))
plants.reshape.df<- rbind(Ile.plants, Leu.plants, Phe.plants, Thr.plants, Val.plants)
colnames(plants.reshape.df)<-c("Group", "Lake", "EAA", "d13C")
#POM
Ile.pom<-as.data.frame(cbind(pom$Group, "Source", "Isoleucine", pom$Ile))
Leu.pom<-as.data.frame(cbind(pom$Group, "Source", "Leucine", pom$Leu))
Phe.pom<-as.data.frame(cbind(pom$Group, "Source", "Phenylalanine", pom$Phe))
Thr.pom<-as.data.frame(cbind(pom$Group, "Source", "Threonine", pom$Thr))
Val.pom<-as.data.frame(cbind(pom$Group, "Source", "Valine", pom$Val))
pom.reshape.df<- rbind(Ile.pom, Leu.pom, Phe.pom, Thr.pom, Val.pom)
colnames(pom.reshape.df)<-c("Group", "Lake", "EAA", "d13C")
#zoops
Ile.zoops<-as.data.frame(cbind(zoops$Group, zoops$Lake, "Isoleucine", zoops$Ile))
Leu.zoops<-as.data.frame(cbind(zoops$Group, zoops$Lake, "Leucine", zoops$Leu))
Phe.zoops<-as.data.frame(cbind(zoops$Group, zoops$Lake, "Phenylalanine", zoops$Phe))
Thr.zoops<-as.data.frame(cbind(zoops$Group, zoops$Lake, "Threonine", zoops$Thr))
Val.zoops<-as.data.frame(cbind(zoops$Group, zoops$Lake, "Valine", zoops$Val))
zoops.reshape.df<- rbind(Ile.zoops, Leu.zoops, Phe.zoops, Thr.zoops, Val.zoops)
colnames(zoops.reshape.df)<-c("Group", "Lake", "EAA", "d13C")
#compiled df
raw.EAA.reshape<-rbind(plants.reshape.df, pom.reshape.df, zoops.reshape.df)
raw.EAA.reshape$EAA<-recode_factor(raw.EAA.reshape$EAA,
Isoleucine = "Ile",
Leucine = "Leu",
Phenylalanine = "Phe",
Threonine = "Thr",
Valine = "Val")
raw.EAA.reshape$Group<-recode_factor(raw.EAA.reshape$Group, Plants = "Terrestrial Plants")
#restructure
raw.EAA.reshape$Group<-as.factor(raw.EAA.reshape$Group)
raw.EAA.reshape$EAA<-as.factor(raw.EAA.reshape$EAA)
raw.EAA.reshape$d13C<-as.numeric(raw.EAA.reshape$d13C)
#Now, that the data is structured, make a boxplot for each AA~ESS~ by sample type (terrestrial plants, POM, zooplankton).
colors2<-c("springgreen3", "dodgerblue", "coral")
####### plot by EAA, with groups of sources and consumers
EAA.box<-ggplot(raw.EAA.reshape, aes(x=EAA, y=d13C, fill=Group))+
geom_boxplot(alpha=0.8)+
#geom_jitter(aes(fill=Group), colour="black",pch=21, size=1)+
scale_fill_manual(values=colors2)+
ylab(expression(paste("AA ", delta^{13}, C, " (\u2030)"))) +
xlab("Essential Amino Acids") +
theme_classic()+
annotate('text', x = 4, y = -8, label='*', size=6)+
annotate('text', x = 3, y = -23, label='*', size=6) + guides(fill = "none")
EAA.box
Figure. Boxplot of essential amino acids (AAEES) δ13C values for terrestrial plants, particulate organic matter (POM), and zooplankton in aquatic habitats in the Eastern Sierras Nevada mountains.
dev.copy(pdf, "figures/FigS3.SNL.EssAA.Boxplot.pdf", width = 6, height = 6, encod="MacRoman")
dev.off()
Make point-line plots (mean +/-SE) for the individual AAESS pooled across samples for each lake to test for the influence of site on baseline carbon isotope values.
# arrange by elevation
raw.EAA.reshape$Lake<- factor(raw.EAA.reshape$Lake,
levels=c("Lukens", "MayPond", "Sunrise2", "Blue", "Greenstone",
"Cascade", "EasternBrook", "Granite2", "Source"))
# group by sample type, lake, and EAA, then calculate the mean d13C for EAA for each group
raw.EAA.reshape.grouped <- raw.EAA.reshape %>%
group_by(Group) %>%
group_by(Lake, .add = TRUE) %>%
group_by(EAA, .add = TRUE) %>%
dplyr::summarize(Mean = mean(d13C, na.rm = TRUE), n = n(), SD = sd(d13C, na.rm = TRUE), SE = SD/sqrt(n))
# remove sources because we are only looking at differences across sites for zooplankton (larger sample sizes)
raw.EAA.reshape.grouped <- raw.EAA.reshape.grouped[!(raw.EAA.reshape.grouped$Lake=="Source"),]
# plot
indiv.EAA.plot<-ggplot(raw.EAA.reshape.grouped, aes(x = EAA, y = Mean, fill = Lake)) +
theme_classic() +
geom_point(size = 3, aes(color = Lake)) +
geom_errorbar(aes(ymin = Mean-SE, ymax = Mean+SE, color = Lake), width = 0.0) +
ylab(expression(paste(delta^{13},C, " (\u2030)"))) +
theme(axis.text = element_text(size = 12, color = "black"), axis.title = element_text(size = 14, color = "black")) +
geom_line(data = raw.EAA.reshape.grouped, aes(x = EAA, y = Mean, color = Lake, group = Lake)) +
theme(aspect.ratio = 0.8) +
guides(color = guide_legend(reverse = TRUE)) +
guides(fill = FALSE)
indiv.EAA.plot
Figure. Mean (± standard error) δ13C values of individual essential amino acids at each location and averaged across zooplankton groups
###############
pdf(file= "figures/FigS4A.indivAA_Lines.pdf", height=7, width=6, encod="MacRoman")
plot_grid(indiv.EAA.plot)
dev.off()
# test whether site really matters and explore this effect more. We will not evaluate this section but it is useful for inspection. Note that the plants and POM were pooled across all locations, so we do not have the ability to test elevation/site level differences in sources with this dataset.
ggplot(raw.EAA.reshape, aes(x=Lake, y=d13C, fill=Group))+
geom_boxplot(alpha=0.8)+
geom_point(aes(fill=Group), colour="black",pch=21, size=1.5, alpha=0.5)+
scale_fill_manual(values=colors2)+
ylab(expression(paste(delta^{13}, C[EAA], " (\u2030)"))) +
xlab("Site: low-to-high elevation") +
theme_classic()
####
# are these lakes significantly different for raw EAAs?
lake.df<- raw.EAA.reshape[!(raw.EAA.reshape$Lake=="Source"),]
lake.mod<-lm(d13C~ Lake, data=lake.df)
anova(lake.mod) # significant at <0.001
# make and save the plot
plot(allEffects(lake.mod), ylab=expression(paste(delta^{13}, C[EAA], " (\u2030)")),
par.strip.text=list(cex=0.7), xlab="Site: low-to-high elevation")
# inspect posthoc
posthoc.lake<-emmeans(lake.mod, ~Lake)
multcomp::cld(posthoc.lake, Letters=letters)
# MayPond a
# Sunrise2 abc
# Lukens ab
# EasternBrook bcd
# Cascade bcd
# Blue cd
# Greenstone d
# Granite2 d
Test for differences in individual EAAs among the three groups (producers [POM, plants], zooplankton). Which amino acids are different across groups? - These models are combined in manuscript Table S3. - Plots of individual AAESS by site are in Figure S4A
# First, get the df organized for the analyses.
# non-normalized EAAs for sources and plankton, test differences in EAA among groups for individual EAAs
ESS.mods<-ESS.df.long
# rename factors
ESS.mods <- ESS.mods %>%
dplyr::rename("orig.class" = "Type")
# make factors for columns
make.fac<-c("Group", "orig.class", "phy.group", "Lake")
ESS.mods[make.fac] <- lapply(ESS.mods[make.fac], factor)
str(ESS.mods)
# select what we want JUST the 'groups' (3 levels)
ESS.mods.df<-ESS.mods %>%
dplyr::select(SampleID,Lake, Group, orig.class, phy.group, Ile, Leu, Phe, Thr, Val)
################### ###################
# test differences in EAA among groups in linear models
################### ###################
ESS.mods.df$Source.Location<-ESS.mods.df$Lake
ESS.mods.df$Source.Location<-ifelse(ESS.mods.df$Group=="POM", "Pooled.POM",
ifelse(ESS.mods.df$Group=="Plants", "Pooled.Plants",
as.character(ESS.mods.df$Source.Location)))
ESS.mods.df$Source.Location<-as.factor(ESS.mods.df$Source.Location)
# look at assumptions
for(i in c(6:10)){
Y<-ESS.mods.df[,i]
full<-lm(Y~Group, data=ESS.mods.df, na.action=na.exclude)
R <- resid(full) #save glm residuals
op<-par(mfrow = c(2,2), mar=c(5,4,1,2), pty="sq")
plot(full, add.smooth = FALSE, which=1)
QQ <- qqnorm(R, main = colnames(ESS.mods.df)[i])
QQline <- qqline(R)
hist(R, xlab="Residuals", main = colnames(ESS.mods.df)[i])
}
#### run models to see where the groups differ from each other
# Isoleucine: no Group diff
Ile.mod<-lm(Ile~Group, data=ESS.mods.df)
anova(Ile.mod) # p=0.296
plot(allEffects(Ile.mod), ylab="Ile", par.strip.text=list(cex=0.7))
# Leucine: no Group diff
Leu.mod<-lm(Leu~ Group, data=ESS.mods.df)
anova(Leu.mod) # p=0.226
plot(allEffects(Leu.mod), ylab="Leu", par.strip.text=list(cex=0.7))
# Phenylalanine: Group signif
Phe.mod<-lm(Phe~Group, data=ESS.mods.df)
anova(Phe.mod) # p=0.0005
plot(allEffects(Phe.mod), ylab="Phe", par.strip.text=list(cex=0.7))
posthoc.Phe<-emmeans(Phe.mod, ~Group)
multcomp::cld(posthoc.Phe, Letters=letters) # POM (A), Zoop (A), Plant (B)
# Threonine: Group signif
Thr.mod<-lm(Thr~Group, data=ESS.mods.df)
anova(Thr.mod) # p=0.002
plot(allEffects(Thr.mod), ylab="Thr", par.strip.text=list(cex=0.7))
posthoc.Thr<-emmeans(Thr.mod, ~Group)
multcomp::cld(posthoc.Thr, Letters=letters) # POM (A), Zoop (AB), Plant (B)
###### Valine:
# No group diff
Val.mod<-lm(Val~Group, data=ESS.mods.df)
anova(Val.mod) # p=0.275
plot(allEffects(Val.mod), ylab="Val", par.strip.text=list(cex=0.7))
Using the standardized δ13C AAESS data, we will use a PERMANOVA to test whether the three groups (terrestrial plants, POM, zooplankton) differ from each other in multivariate space. From here we will test effects of mean-normalization (or not) on the zooplankton fractions, considering we observed signficant site level differences due to base-of-the-foodweb variance.
Take away: POM not different from zoops, but POM and plants differ from each other (p=0.008).This is not directly used in the manuscript as “3 groups”, instead we use keep taxa separated in 3 groups and compare them to plants and POM (Tabls S4B-top)
#### #### #### #### #### #### #### #### #### #### ####
#### df just for the "Groups", which drops lake and phy.group
ESS.groups<- ESS.mods.df %>% dplyr::select(-c(Lake, orig.class, phy.group))
# just AAs data
ESS.groups.dat<-ESS.groups %>%
dplyr::select(Ile, Leu, Phe, Thr, Val)
# factors
# Here: only using Group: Zooplankton, Plants, POM
Group.fac<-ESS.groups %>%
dplyr::select(Group)
# run PERMANOVA for 3 groups
set.seed(51)
EAA.perm.GROUP<-adonis2(ESS.groups.dat~ Group, data=Group.fac, permutations=999, method="euclidian")
EAA.perm.GROUP # different, p=0.008
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = ESS.groups.dat ~ Group, data = Group.fac, permutations = 999, method = "euclidian")
## Df SumOfSqs R2 F Pr(>F)
## Group 2 619.0 0.15771 4.0257 0.008 **
## Residual 43 3306.1 0.84229
## Total 45 3925.1 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
set.seed(51)
pairwise.adonis(ESS.groups.dat, factors=c(Group.fac$Group), sim.method = "euclidian", p.adjust.m = "none")
# Plants differ from POM (p=0.001), plants significant from Zooplankton (p=0.037), POM not diff from zoops
Make a non-norm PCA with values to explore the differences at the
base of the food web in δ13C values across sites.
- Remove the source since there is no information/low sample size to
test site-specific effects - Run PERMANOVA to test for site effects. -
Perform PCA to see patterns in zooplankton by sites.
Take home: multivariate patterns (PCA) and PERMANOVA show site-effects for non-normalized data. - This is in Figure S4B & Table S4-topA.
This PERMANOVA (EAA.perm.LAKE) tests which sites are
different form each other.
# non-normalized data for the lake analysis
# make the PCA df
PCA.df<-ESS.mods.df %>%
dplyr::select(Lake, phy.group, Ile, Leu, Phe, Thr, Val)
# make new level for "Elevation" (in m), "0" for ordering, but actually is "NA"
PCA.df$Lake<- factor(PCA.df$Lake, levels=c("Lukens", "MayPond", "Sunrise2", "Blue", "Greenstone",
"Cascade", "EasternBrook", "Granite2", "Source"))
# remove the sources (POM and Plants), just look at the zooplankton which have site-level replications
PCA.df.trim<-PCA.df[!(PCA.df$Lake=="Source"),]
# factors: just Lake sampling sites and phy.group (cladocera or copepoda, >350um)
PCA.fac<- PCA.df.trim %>%
dplyr::select(Lake, phy.group)
# the response variables
PCA.dat.raw<- PCA.df.trim %>%
dplyr::select(Ile, Leu, Phe, Thr, Val)
####### ####### ####### ####### #######
####### Run a PERMANOVA for Lake, *no sources* to test for differences among zooplankton by lake
# run PERMANOVA for Lake non-norm
set.seed(213)
EAA.perm.LAKE<-adonis2(PCA.dat.raw~Lake, data=PCA.fac, permutations=999, method="euclidian")
EAA.perm.LAKE # different at p=0.001, R2=0.87 at Lake level
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = PCA.dat.raw ~ Lake, data = PCA.fac, permutations = 999, method = "euclidian")
## Df SumOfSqs R2 F Pr(>F)
## Lake 7 2257.68 0.87005 19.129 0.001 ***
## Residual 20 337.21 0.12995
## Total 27 2594.89 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pairwise.adonis(PCA.dat.raw, factors=c(PCA.fac$Lake), sim.method = "euclidian", p.adjust.m = "none")
# Sunrise and Greenstone low rep, not able to make good comparisons
# below significant
# Lukens vs all others (Blue, Cascade, EasternBrook, Granite 2) (not diff from May)
# May Pond vs. others (Blue, Cascade, EasternBrook, Granite 2) (not diff from Lukens)
# Granite2 vs. others (EasternBrook, Cascade, Blue, MayPond, Lukens)
############### ####### ####### #######
Now, perform a PCA on the non-normalized data, this shows the ellipses according to site, pooling across full plankton communities. Sources are excluded.
### PCA
# run the PCA on scaled and centered data
set.seed(138)
PC.plank<- prcomp(PCA.dat.raw, center = TRUE, scale= TRUE)
PC.plank.summary<-summary(PC.plank)
#plot(PC.plank, type="lines", main="PC.area eigenvalues")
# 2 PCs explain 96% of variation, most in 1 axis (9)
###### plot for PCA by Lake
## PC1 and PC2
PCA.lake.plank <- ggbiplot(PC.plank, choices = 1:2, obs.scale = 1, var.scale = 1,
groups=PCA.fac$Lake,
ellipse = TRUE, circle = FALSE, alpha=0, ellipse.prob=0.90) +
geom_point(aes(colour=PCA.fac$Lake,
shape=PCA.fac$phy.group), size = 2) +
geom_vline(xintercept=0, linetype="dashed", color = "gray60")+
geom_hline(yintercept=0, linetype="dashed", color = "gray60")+
scale_x_continuous(breaks=pretty_breaks(n=5))+
#scale_color_manual(values=PCA.col2)+
#scale_shape_manual(values=c(16,1,22,2,3)) +
ggtitle("ESS-raw")+
theme_classic()+
guides(color = guide_legend(reverse=TRUE))+
theme(legend.text=element_text(size=10),
panel.background = element_rect(colour = "black", linewidth=1),
aspect.ratio=0.8, axis.ticks.length=unit(0.2, "cm"),
axis.text.y=element_text(margin=unit(c(0.5, 0.5, 0.5, 0.5), "cm")),
axis.text.x=element_text(margin=unit(c(0.5, 0.5, 0.5, 0.5), "cm")))
PCA.lake.plank
Figure. Principal component analysis of measured AAESS δ13C values for zooplankton groups across sampling locations, ordered from low (Lukens) to high (Granite 2) elevation.
######
pdf(file= "figures/FigS4B.PCA.location.pdf", height=7, width=6)
plot_grid(PCA.lake.plank)
dev.off()
This code chunk generated the PERMANOVA testing for zooplankton and sources - This is Table S4B-top
# factors: includes the zooplankton groups (3) plus the sources (2)
PCA.fac.raw.nonnorm.zoop<- PCA.df %>%
dplyr::select(Lake, phy.group)
# the response variables
PCA.dat.raw.nonnorm.zoop<- PCA.df %>%
dplyr::select(Ile, Leu, Phe, Thr, Val)
####### ####### ####### ####### #######
####### Run a PERMANOVA for Lake, no sources, non-norm EAAs
set.seed(213)
EAA.perm.nonnorm.zoop<-adonis2(PCA.dat.raw.nonnorm.zoop~phy.group, data=PCA.fac.raw.nonnorm.zoop,
permutations=999, method="euclidian")
EAA.perm.nonnorm.zoop # different at p=0.006, R2=0.23
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = PCA.dat.raw.nonnorm.zoop ~ phy.group, data = PCA.fac.raw.nonnorm.zoop, permutations = 999, method = "euclidian")
## Df SumOfSqs R2 F Pr(>F)
## phy.group 4 919.0 0.23414 3.1337 0.006 **
## Residual 41 3006.1 0.76586
## Total 45 3925.1 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# pairwise tests
set.seed(211)
pairwise.adonis(PCA.dat.raw.nonnorm.zoop, factors=c(PCA.fac.raw.nonnorm.zoop$phy.group),
sim.method = "euclidian", p.adjust.m = "none")
We see clear effects of site/sampling location that reflect elevation of lakes (possibly in interaction with other environmental factors). Examine how mean-normalization effects this, with the implication that mean-normalization is an important point in our system if using PCA to examine data patterns.
Larsen, T., T. Hansen, and J. Dierking. 2020. Characterizing niche differentiation among marine consumers with amino acid δ13C fingerprinting. Ecol. Evol. 10: 7768–7782. doi:10.1002/ece3.6502
We see that mean-normalization of AAESSgives greater resolution and increasing R2 of PLANKTON group effects and reducing site-effects.
# make the PCA df-normalized, remove proportion plant, KEEPING sources
# *Note* PCA.df above has sources in, removed later in 'PCA.df.trim'
# make ID column to run the normalization
PCA.df.normAA<-PCA.df
#make ID column to run the normalization
PCA.df.normAA$ID<-1:nrow(PCA.df.normAA)
for(i in 1:length(PCA.df.normAA$ID)){
PCA.df.normAA$Ile.n[i] <- (PCA.df.normAA$Ile[i]-mean(as.numeric(PCA.df.normAA[i,3:7])))
PCA.df.normAA$Leu.n[i] <- (PCA.df.normAA$Leu[i]-mean(as.numeric(PCA.df.normAA[i,3:7])))
PCA.df.normAA$Phe.n[i] <- (PCA.df.normAA$Phe[i]-mean(as.numeric(PCA.df.normAA[i,3:7])))
PCA.df.normAA$Thr.n[i] <- (PCA.df.normAA$Thr[i]-mean(as.numeric(PCA.df.normAA[i,3:7])))
PCA.df.normAA$Val.n[i] <- (PCA.df.normAA$Val[i]-mean(as.numeric(PCA.df.normAA[i,3:7])))
}
# sources and plankton groups now have been mean-normalized
Now that data has been mean-normalized, test effect of normalization on interpretation of lake effects. - Does normalization increase our resolution of plankton differences and account for site-level effects? - This is in Table S4-topB.
# what is effect of lake if we normalize>? subet the PCA.df.normAA df to REMOVE the sources
# only look at the plankton -- across site effects key here
PCA.df.normAA.noSource<-PCA.df.normAA[!(PCA.df.normAA$Lake=="Source"),]
# the response variables
PCA.dat.norm.noSource<- PCA.df.normAA.noSource %>%
dplyr::select(Ile.n, Leu.n, Phe.n, Thr.n, Val.n)
# the factors
PCA.fac.norm.noSource<- PCA.df.normAA.noSource %>%
dplyr::select(Lake, phy.group)
set.seed(213)
EAA.perm.norm.lake<-adonis2(PCA.dat.norm.noSource~ Lake, data=PCA.fac.norm.noSource,
permutations=999, method="euclidian")
EAA.perm.norm.lake # NOT DIFFERENT at p=0.069
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = PCA.dat.norm.noSource ~ Lake, data = PCA.fac.norm.noSource, permutations = 999, method = "euclidian")
## Df SumOfSqs R2 F Pr(>F)
## Lake 7 67.919 0.35098 1.5451 0.069 .
## Residual 20 125.596 0.64902
## Total 27 193.515 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now run the PERMANOVA on mean-normalized AAESS for all zooplankton pooled across sites, sources RETAINED. - This is in Table S4-bottomB
# the response variables
PCA.dat.norm<- PCA.df.normAA %>%
dplyr::select(Ile.n, Leu.n, Phe.n, Thr.n, Val.n)
# the factors
PCA.fac.norm<- PCA.df.normAA %>%
dplyr::select(Lake, phy.group)
####### ####### ####### ####### #######
####### Run a PERMANOVA on NORMALIZED values for plankton and sources -- testing source and zoopl.
# run PERMANOVA
set.seed(213)
EAA.perm.norm.plank<-adonis2(PCA.dat.norm~ phy.group, data=PCA.fac.norm, permutations=999, method="euclidian")
EAA.perm.norm.plank # different at p=0.001
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = PCA.dat.norm ~ phy.group, data = PCA.fac.norm, permutations = 999, method = "euclidian")
## Df SumOfSqs R2 F Pr(>F)
## phy.group 4 394.57 0.50659 10.524 0.001 ***
## Residual 41 384.30 0.49341
## Total 45 778.87 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# pairwise
set.seed(217)
pairwise.adonis(PCA.dat.norm, factors=c(PCA.fac.norm$phy.group), sim.method="euclidean", p.adjust.m = "none")
Perform a PCA on the mean-normalized data to examine patterns in PCA space across the zooplankton groups and relative to their sources (POM, terrestrial plants) - This is Figure S4B.
####### ####### ####### ####### #######
# run the PCA on scaled and centered data
set.seed(508)
PC.plank.norm<- prcomp(PCA.dat.norm, center = TRUE, scale= TRUE)
PC.plank.norm.summ<-summary(PC.plank.norm)
#plot(PC.plank.norm.summ, type="lines", main="PC.area eigenvalues")
###### plot for PCA by Lake
#LDA box plot by phy.group
phy.5.colors<-c("dodgerblue", "gray65", "orange","coral2", "springgreen4")
PCA.fac.norm$phy.group<-factor(PCA.fac.norm$phy.group, levels=c("POM", ">350um", "cladocera",
"copepoda","Plants"))
## PC1 and PC2
PCA.norm.plank <- ggbiplot(PC.plank.norm, choices = 1:2, obs.scale = 1, var.scale = 1,
groups=PCA.fac.norm$phy.group,
ellipse = TRUE, circle = FALSE, alpha=0, ellipse.prob=0.90) +
geom_point(aes(colour=PCA.fac.norm$phy.group), size = 2) +
geom_vline(xintercept=0, linetype="dashed", color = "gray60")+
geom_hline(yintercept=0, linetype="dashed", color = "gray60")+
scale_x_continuous(breaks=pretty_breaks(n=5))+
scale_color_manual(values=phy.5.colors)+
#scale_shape_manual(values=c(16,1,22,2,3))+
ggtitle("ESS-normalized")+
theme_classic()+
theme(legend.text=element_text(size=10),
panel.background = element_rect(colour = "black", linewidth=1),
axis.ticks.length=unit(0.2, "cm"),
axis.text.y=element_text(margin=unit(c(0.5, 0.5, 0.5, 0.5), "cm")),
axis.text.x=element_text(margin=unit(c(0.5, 0.5, 0.5, 0.5), "cm")))
PCA.norm.plank
Figure. Principal component analysis (PCA) for mean-normalized AAESS δ13C values (AAESS-norm) for zooplankton groups and sources (POM, terrestrial plants). Ellipses represent 90% confidence intervals, where group sample sizes are n ≥ 4.
### export it
pdf(file= "figures/FigS5.PCA.norm.plank.pdf", height=5, width=6)
plot_grid(PCA.norm.plank)
dev.off()
Use a linear discriminant analysis (LDA) to determine separation of primary producers and assign zooplankton to these groups as sources of AAEAA.
LDA has internal normalization approach, and we tested whether there is no difference in output whether data is mean-normalized or not and found no effect. Therefore, we proceed with the measured (non-mean-normalized data for subsequent analyses) of AAESS isotope fingerprinting and mixing models. In the LDA, we will use primary prodcers first, inspecting the error rate of a jackknifed - leave one out - model fit and compare classification rates.
#Run an LDA with a jackknifing model fit to look at error rate
#'CV = TRUE' makes the LDA run a jackknifed - leave one out - model fit
All.lda <- lda(Group ~ Ile + Leu + Phe + Thr + Val, data = prod, CV = TRUE)
#Create a table which compares the classification from the LDA model to the actual classification
All.reclass <- table(prod$Group, All.lda$class)
#Total percent of samples correctly classified is the sum of the diagonal of this table
sum(diag(prop.table(All.reclass)))
#Percent of each producer group correctly reclassified
diag(prop.table(All.reclass, 1))
#Create a training LDA function from the library data
#Note - you can't use the 'All.lda' object above because the 'CV = TRUE' command was used to create it, and for some reason this won't work with the predict() function
All.train <- lda(Group~ Ile + Leu + Phe + Thr + Val, data = prod)
All.train
#Write a csv file for the coefficients of LD1
lda.info <- as.data.frame(All.train$scaling)
write.csv(lda.info, "output/SNL_Producers_LDA_loadings_NMX.csv", row.names=FALSE)
#Create a data frame with these LDA coordinates
AlldataPredict <- data.frame(Group = prod$Group, SampleID = prod$SampleID, predict(All.train)$x)
#Write a csv file for the LDA coordinatinates
write.csv(AlldataPredict, "output/SNL_Producers_LDA_coords_NMX.csv", row.names=FALSE)
Use the LDA to classify the zooplankton as assigning to either POM or terrestrial plants. We will later use the LDA scores (LD1) in a mixing model.
#Classify zooplankton
zoop.predict <- predict(object = All.train, newdata = zoops)
zoop.predict.data <- data.frame(SampleID = zoops$SampleID, Type = zoops$Type, Lake = zoops$Lake, zoop.predict)
#Write a csv file for the zooplankton LDA info
write.csv(zoop.predict.data, "output/SNL_Zooplankton_LDA_info_NMX.csv", row.names=FALSE)
Now that we have new data, let’s combine it with the previous data so that we have a unified dataframe with the measured AAESS data, sampling information, and LD-classification and LD1 coordinate.
##############
# combine the zooplankton and source LDA
##############
####### Producers
# AlldataPredict.mod original columns as - "Group", "SampleID", "LD1"
# rearrange
AlldataPredict.mod<- AlldataPredict %>%
dplyr::select("SampleID", "Group", "LD1")
# rename columns
colnames(AlldataPredict.mod)<-c("SampleID", "orig.class", "LD1")
# make new columns to allow merge
AlldataPredict.mod$Lake<-"Source"
AlldataPredict.mod$LD.class<-"Source"
# rearrange
AlldataPredict.mod<- AlldataPredict.mod %>%
dplyr::select("SampleID", "orig.class", "Lake", "LD.class", "LD1")
############# Zooplankton
# grab columns we want
zoop.predict.data.NMX<- zoop.predict.data %>%
dplyr::select("SampleID", "Type", "Lake", "class", "LD1")
# rename factors
zoop.predict.data.NMX <- zoop.predict.data.NMX %>%
dplyr::rename("orig.class" = "Type")
zoop.predict.data.NMX <- zoop.predict.data.NMX %>%
dplyr::rename("LD.class" = "class")
####### bind the 2 dfs
LDA.df<-rbind(zoop.predict.data.NMX, AlldataPredict.mod)
# make new level for "phy.group" based on taxonomy
LDA.df$phy.group<- ifelse(LDA.df$orig.class == "calanoid", "copepoda",
ifelse(LDA.df$orig.class == "calanoid_cyclopoid", "copepoda",
ifelse(LDA.df$orig.class == "Ceriodaphnia", "cladocera",
ifelse(LDA.df$orig.class == "Daphnia", "cladocera",
ifelse(LDA.df$orig.class == "Holopedium", "cladocera",
ifelse(LDA.df$orig.class == "Ceriodaphnia", "cladocera",
ifelse(LDA.df$orig.class == ">350um", ">350um",
ifelse(LDA.df$orig.class == "Plants", "Plants",
"POM"))))))))
# make new level for "Elevation" (in m), "0" for ordering, but actually is "NA"
LDA.df$Elevation<- ifelse(LDA.df$Lake == "Lukens", "2513",
ifelse(LDA.df$Lake == "MayPond", "2714",
ifelse(LDA.df$Lake == "Sunrise2", "2830",
ifelse(LDA.df$Lake == "Blue", "3054",
ifelse(LDA.df$Lake == "Greenstone", "3091",
ifelse(LDA.df$Lake == "Cascade", "3140",
ifelse(LDA.df$Lake == "EasternBrook", "3155",
ifelse(LDA.df$Lake == "Granite2", "3178",
"0"))))))))
# rearrange
LDA.df<- LDA.df %>%
dplyr::select("SampleID", "Lake", "Elevation", "orig.class", "phy.group", "LD.class", "LD1")
write.csv(LDA.df, "output/Producer_zoop_LDA_score.csv", row.names=FALSE)
Use the LDA scores, plot each score for each sample to see the variance and site-differences.
##### can load in from output '("output/Producer_zoop_LDA_score.csv")'
LDA.df$Elevation<-as.numeric(LDA.df$Elevation)
# reorder lake by relative elevation
LDA.df$Lake<-reorder(LDA.df$Lake, LDA.df$Elevation)
# reorder factors
LDA.df$orig.class<-factor(LDA.df$orig.class, levels=c(">350um", "calanoid",
"calanoid_cyclopoid", "Ceriodaphnia",
"Daphnia", "Holopedium", "POM", "Plants"))
phy.lda.col<-c("gray65", "firebrick1","firebrick1", "goldenrod1", "goldenrod1","goldenrod1", "dodgerblue","springgreen4")
phy.lda.pts<-c(21, 21,24, 21,24,22, 21,21)
###
LDA.spp.lake<-ggplot(LDA.df, aes(x=LD1, y=Lake,
fill = orig.class, shape= orig.class))+
geom_point(size=3)+
geom_vline(xintercept=0.4, linetype="dashed", color = "gray60")+
annotate(geom="text", label="Plant-Supported", x=-3.5, y=9.5, color="springgreen4", size=3) +
annotate(geom="text", label="POM-Supported", x=2.5, y=9.5, color="dodgerblue", size=3) +
scale_fill_manual(values=phy.lda.col)+
scale_shape_manual(values=phy.lda.pts)+
ylab("Site: low-to-high elevation") +
xlab("LD1") +
theme_classic()
LDA.spp.lake
Figure. Linear discriminant analysis of AAESS δ13C values, with sample assignment across LD1 (linear discriminant-axis 1) from lowest (Lukens) to highest elevation (Granite 2). Dashed vertical line shows assignment of zooplankton samples as either supported by terrestrial plants or POM.
dev.copy(pdf, "figures/FigS6.LDA.spp.lake.pdf", width = 8, height = 6)
dev.off()
Use the LDA scores to make boxplots showing the assignment for each zooplankton group. This pools across all sites and emphasizes functional groups/taxa.
####################
#LDA box plot by phy.group
phy.5.colors<-c("dodgerblue", "gray85", "orange","coral2", "springgreen4")
LDA.df$phy.group<-factor(LDA.df$phy.group, levels=c("POM", ">350um", "cladocera",
"copepoda",
"Plants"))
LDA.phy.boxplot<-ggplot(LDA.df, aes(x=LD1, y=phy.group,
fill = phy.group))+
geom_boxplot(alpha=0.9)+
geom_jitter(aes(fill=phy.group), colour="black",pch=21, size=2) +
geom_vline(xintercept=0.4, linetype="dashed", color = "gray60")+
annotate(geom="text", label="Plant-Supported", x=-3.5, y=5.5, color="springgreen4", size=3) +
annotate(geom="text", label="POM-Supported", x=2.5, y=5.5, color="dodgerblue", size=3) +
scale_fill_manual(values=phy.5.colors)+
scale_color_manual(values=phy.5.colors)+
ylab("Consumer or Source") +
xlab("LD1") +
theme_classic()
LDA.phy.boxplot
Figure. Boxplots of linear discriminant 1 (LD1) coordinates from linear discriminant analysis of AAEES δ13C values. Dashed vertical lines demarcates assignment of zooplankton samples with terrestrial plants (left) or POM (right).
dev.copy(pdf, "figures/Fig2.LDA.phy.boxplot.pdf", height = 6, width = 6)
dev.off()
Firest, export zooplankton and source data to feed into MixSIAR. We will calcualte the mean, standard deviations, and sample sized for LD1 values of the sources.
#Load Packages for MixSIAR Models .... should already load in setup chunk, but just in case
library(MixSIAR)
library(dplyr)
library(GGally)
#Create a new data frame with desired columns (Zooplankton)
Mix.df <- LDA.df %>%
dplyr::select("SampleID", "orig.class", "phy.group", "Lake", "Elevation", "LD1")
#Remove Plants from the data frame
Mix.df <- Mix.df[!(Mix.df$orig.class=="Plants"),]
#Remove POM from the data frame
Mix.df <- Mix.df[!(Mix.df$orig.class=="POM"),]
#Remove TB-tanks from the data frame
Mix.df <- Mix.df[!(Mix.df$Lake=="TB-tanks"),]
Mix.df.summary.Lake<-aggregate(LD1~Lake, data=Mix.df, FUN=length)
Mix.df.summary.phy<-aggregate(LD1~phy.group, data=Mix.df, FUN=length)
Mix.df.summary.phy.lake<-aggregate(LD1~Lake+phy.group, data=Mix.df, FUN=length)
#Write and export a csv of the zooplankton data frame
write.csv(Mix.df, "output/Mix.df.csv", row.names=FALSE)
#Create a new data frame with desired columns (Sources)
Source.df <- LDA.df[(LDA.df$Lake=="Source"),]
Source.df <- Source.df %>%
dplyr::select("orig.class", "LD1")
#Calculate mean LD1 values of the sources
mean.df <- aggregate(LD1~orig.class, data=Source.df, FUN=mean)
#Calculate standard deviations for LD1 values of the sources
sd.df <- aggregate(LD1~orig.class, data=Source.df, FUN=sd)
#Calculate the number of samples for each of the sources
n.df <- aggregate(LD1~orig.class, data=Source.df, FUN=length)
#Bind the values calculated above into one data frame
source.agg.df <- cbind(n.df, mean.df[2], sd.df[2])
#Rename the columns in this data frame
colnames(source.agg.df) <- c("orig.class", "n", "MeanLD1", "SDLD1")
#Write and export a csv of the source data frame
write.csv(source.agg.df, "output/source.agg.df.csv", row.names=FALSE)
Now import data and create model structure. This code will run
through a series of loop models and produce outputs for a
"by sample" model followed by a
complete model. - Outputs from each model will be compared
- Compiled model outputs are reported in Table S6.
#Load zooplankton (consumer) data and assign factors
cons <- load_mix_data(filename = "output/Mix.df.csv",
iso_names=c("LD1"),
factors=c("SampleID"),
fac_random=c(FALSE),
fac_nested=c(FALSE),
cont_effects=NULL)
#SampleID is a fixed factor
#Load source data
source <- load_source_data(filename="output/source.agg.df.csv",
source_factors=NULL,
conc_dep=FALSE,
data_type="means",
cons)
#Load TDF data
discr <- load_discr_data(filename="data/MixSIAR_TDFs.csv", cons)
##### Check data
# Make an isospace plot
# isospace_plot <- plot_data(filename="output/MixSIAR_isospace_plot_SampleIDs", plot_save_pdf=FALSE, plot_save_png=FALSE, cons,source,discr)
############
# Run MixSIAR by-sample Model
set.seed(12)
#Define the model and error structure and write JAGS model file
model_filename <- "output/MixSIAR_models/SampleID_models/MixSIAR_Model_SampleID_fixed.txt"
resid_err <- FALSE
process_err <- TRUE
write_JAGS_model(model_filename, resid_err, process_err, cons, source)
#Run the JAGS model
invt.jags.mod <- run_model(run="normal", cons, source, discr, model_filename, alpha.prior=1, resid_err, process_err)
#Process diagnostics, summary stats, and posterior plots
output_JAGS(invt.jags.mod, cons, source, output_options=list(
summary_save = TRUE,
summary_name = "output/MixSIAR_models/SampleID_models/MixSIAR_SampleID_summary_stats",
sup_post = FALSE,
plot_post_save_pdf = TRUE,
plot_post_name = "output/MixSIAR_models/SampleID_models/MixSIAR_SampleID_posterior_density",
sup_pairs = TRUE,
plot_pairs_save_pdf = TRUE,
plot_pairs_name = "output/MixSIAR_models/SampleID_models/MixSIAR_SampleID_pairs_plot",
sup_xy = TRUE,
plot_xy_save_pdf = TRUE,
plot_xy_name = "output/MixSIAR_models/SampleID_models/MixSIAR_SampleID_xy_plot",
gelman = TRUE,
heidel = FALSE,
geweke = TRUE,
diag_save = TRUE,
diag_name = "output/MixSIAR_models/SampleID_models/MixSIAR_SampleID_diagnostics",
indiv_effect = FALSE,
plot_post_save_png = FALSE,
plot_pairs_save_png = FALSE,
plot_xy_save_png = FALSE))
graphics.off()
#save all results
save.image("output/MixSIAR_models/SampleID_models/MixSIAR_SampleID_fixed_data.RData")
# *By Sample OUTPUT*: Clean and process summary stats for the `"by sample model"` and generate an output
# read in the summary stats from mixSIAR, parse and make a dataframe, replacing spaces from .txt
SAMP.mixSIAR.output<-read.table("output/MixSIAR_models/SampleID_models/MixSIAR_SampleID_summary_stats.txt", sep='\t', header=TRUE, skip=6)
SAMP.mixSIAR.output<-as.data.frame(SAMP.mixSIAR.output)
colnames(SAMP.mixSIAR.output)<-"full.data"
# remove all extra spaces using "squish", make them 1 space only
library(stringr)
SAMP.mixSIAR.output$full.data<- str_squish(SAMP.mixSIAR.output$full.data)
# rename columns
library(plyr)
samp.mix.out<-separate(SAMP.mixSIAR.output, full.data, into = c("factor", "Mean", "SD", "2.5.perc", "5.perc", "25.perc", "50.perc", "75.perc", "95.perc", "97.5.perc"), sep=" ")
# parse the factors here into 3 columns
new.cols<- stringr::str_split_fixed(samp.mix.out$factor, "\\.", 3) %>%
as.data.frame() %>%
setNames(c("proportion", "SampleID", "Source"))
new.cols<- new.cols %>%
dplyr::select("SampleID", "Source")
# combine the new, separated columns with the data, verify the levels match, then remove "factor"
Samp.out.cleaned<-cbind(new.cols, samp.mix.out[1:10])
Samp.out.cleaned<-Samp.out.cleaned %>%
dplyr::select(-factor)
write.csv(Samp.out.cleaned, "output/MixSIAR_Samp.df.csv", row.names=FALSE)
#### MixSIAR: complete models
#Now import data and create model structure. This code will run through a series of loop models and produce outputs for a `complete model`.
n.mod <- 4
mix <- vector("list", n.mod)
#Define mixtures for each model
#Null Model
mix[[1]] <- load_mix_data(filename="output/Mix.df.csv",
iso_names=c("LD1"),
factors=NULL,
fac_random=NULL,
fac_nested=NULL,
cont_effects=NULL)
#Group - Fixed; Taxa - Random, Nested
mix[[2]] <- load_mix_data(filename="output/Mix.df.csv",
iso_names=c("LD1"),
factors=c("phy.group", "orig.class"),
fac_random=c(FALSE, TRUE),
fac_nested=c(FALSE, TRUE),
cont_effects=NULL)
#Group - Fixed; Lake - Random, Nested
mix[[3]] <- load_mix_data(filename="output/Mix.df.csv",
iso_names=c("LD1"),
factors=c("phy.group", "Lake"),
fac_random=c(FALSE, TRUE),
fac_nested=c(FALSE, TRUE),
cont_effects=NULL)
#Elevation - Continuous; Group - Random
mix[[4]] <- load_mix_data(filename="output/Mix.df.csv",
iso_names=c("LD1"),
factors="phy.group",
fac_random=TRUE,
fac_nested=FALSE,
cont_effects="Elevation")
##### ##### ##### #####
##### Run MixSIAR complete models
set.seed(12)
#Run the models
source <- vector("list", n.mod)
discr <- vector("list", n.mod)
jags.mod <- vector("list", n.mod)
#Run loop
for(mod in 1:n.mod){
# create sub-directory and move into it
mainDir <- getwd()
subDir <- paste0("model_", mod)
dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
setwd(file.path(mainDir, subDir))
#Load source data
source[[mod]] <- load_source_data(filename="~/Desktop/Github/Sierra-plankton-CSIA/output/source.agg.df.csv",
source_factors=NULL,
conc_dep=FALSE,
data_type="means",
mix[[mod]])
#Load TDF data
discr[[mod]] <- load_discr_data(filename="~/Desktop/Github/Sierra-plankton-CSIA/data/MixSIAR_TDFs.csv", mix[[mod]])
#Define model structure and write JAGS model file
model_filename <- paste0("MixSIAR_model_", mod, ".txt")
resid_err <- TRUE
process_err <- TRUE
write_JAGS_model(model_filename, resid_err, process_err, mix[[mod]], source[[mod]])
#Run JAGS model
jags.mod[[mod]] <- run_model(run="very long", mix[[mod]], source[[mod]], discr[[mod]], model_filename, alpha.prior=1, resid_err, process_err)
#Process diagnostics, summary stats, and posterior plots
output_JAGS(jags.mod[[mod]], mix[[mod]], source[[mod]], output_options=list(
summary_save = TRUE,
summary_name = "summary_statistics",
sup_post = FALSE,
plot_post_save_pdf = TRUE,
plot_post_name = "posterior_density",
sup_pairs = TRUE,
plot_pairs_save_pdf = TRUE,
plot_pairs_name = "pairs_plot",
sup_xy = TRUE,
plot_xy_save_pdf = FALSE,
plot_xy_name = "xy_plot",
gelman = TRUE,
heidel = FALSE,
geweke = TRUE,
diag_save = TRUE,
diag_name = "diagnostics",
indiv_effect = FALSE,
plot_post_save_png = FALSE,
plot_pairs_save_png = FALSE,
plot_xy_save_png = FALSE))
graphics.off()
setwd(mainDir)
}
# Save all results
save.image("MixSIAR_Model_Comparisons_Data.RData")
##### Compare Models
#Use 'compare_models' to get table with LOOic weights
names(jags.mod) <- c("null", "group_taxa","group_lake", "elev_group")
comparison.table <- compare_models(jags.mod)
comparison.table <- as.data.frame(comparison.table)
write.csv(comparison.table, "MixSIAR_Model_Comparisons_Table.csv")
##### Clean and process summary stats
#This is the data for the MixSIAR full output
# read in the summary stats from mixSIAR, parse and make a dataframe, replacing spaces from .txt
mod3.mixSIAR.output<-read.table("model_3/summary_statistics.txt",
sep='\t', header=TRUE, skip=6)
mod3.mixSIAR.output<-as.data.frame(mod3.mixSIAR.output)
colnames(mod3.mixSIAR.output)<-"full.data"
# remove all extra spaces using "squish", make them 1 space only, using library(stringr)
mod3.mixSIAR.output$full.data<- str_squish(mod3.mixSIAR.output$full.data)
# separate using library(plyr)
mod3.out<-separate(mod3.mixSIAR.output, full.data, into = c("factor", "Mean", "SD", "2.5.perc", "5.perc", "25.perc", "50.perc", "75.perc", "95.perc", "97.5.perc"), sep=" ")
# remove top 2 rows from output
mod3.out<- mod3.out[-c(1:2),]
### top 6 rows need to be handles differently since there is no "lake" here
mod3.out.phy<-mod3.out[c(1:6), ]
new.cols.mod3.phy<- stringr::str_split_fixed(mod3.out.phy$factor, "\\.", 3) %>%
as.data.frame() %>%
setNames(c("proportion", "phy.group", "Source"))
#make NA column for Lake
new.cols.mod3.phy$Lake<-NA
#rearrange
new.cols.mod3.phy<-new.cols.mod3.phy %>%
dplyr::select("phy.group", "Lake", "Source")
### now, the rest of data, by Lake
mod3.out.by.lake<-mod3.out[-c(1:6), ]
# parse the factors here
new.cols.mod3.lake<- stringr::str_split_fixed(mod3.out.by.lake$factor, "\\.", 4) %>%
as.data.frame() %>%
setNames(c("proportion", "phy.group", "Lake", "Source"))
new.cols.mod3.lake<- new.cols.mod3.lake %>%
dplyr::select("phy.group", "Lake", "Source")
# combine the new, separated columns with the data
Mod3.all.facs<-rbind(new.cols.mod3.phy, new.cols.mod3.lake)
# combine with data
Mod3.out.cleaned<-cbind(Mod3.all.facs, mod3.out[1:10]) # verify the levels match, then remove "factor"
Mod3.out.cleaned<-Mod3.out.cleaned %>%
dplyr::select(-factor)
write.csv(Mod3.out.cleaned, "output/MixSIAR_Mod3.df.csv", row.names=FALSE)
Make plots from mixSIAR. We will make a plot for each zooplankton group and one for each site.
############### now makes some plots and run models
mod3<-read.csv("output/MixSIAR_Mod3.df.csv", check.names=FALSE)
#check.names=FALSE prevents an "X" being added to the columns that start with a number
# make factors for columns
make.fac<-c("phy.group", "Source")
mod3[make.fac] <- lapply(mod3[make.fac], factor)
mod3$Source<-factor(mod3$Source, levels=c("POM", "Plants"))
############# Look at the groups pooled across the lakes
# phy group effects, across lakes
mod3.phy<-mod3[(1:6),]
mod3.phy.fig<-ggplot(data=mod3.phy, aes(x=phy.group, y=Mean, fill=Source))+
scale_fill_manual(values=c("dodgerblue", "springgreen3"))+
geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD, width=.2),
position=position_dodge(0.5))+
geom_point(size=4, position=position_dodge(0.5), color="black", pch=21)+
ylab("Proportion of EAA-source") +
xlab("Zooplankton Groups") +
theme_classic()
mod3.phy.fig
### export it
pdf(file= "figures/Fig3.Phy.mixsiar.pdf", height=5, width=5)
mod3.phy.fig
dev.off()
############# Look at the groups within each lake
# phy group effects, across lakes
mod3.lake<-mod3[-(1:6),]
# make new level for "Elevation" (in m)
mod3.lake$Elevation<- as.numeric(ifelse(mod3.lake$Lake == "Lukens", "2513",
ifelse(mod3.lake$Lake == "MayPond", "2714",
ifelse(mod3.lake$Lake == "Sunrise2", "2830",
ifelse(mod3.lake$Lake == "Blue", "3054",
ifelse(mod3.lake$Lake == "Greenstone", "3091",
ifelse(mod3.lake$Lake == "Cascade", "3140",
ifelse(mod3.lake$Lake == "EasternBrook", "3155",
ifelse(mod3.lake$Lake == "Granite2", "3178",
"0")))))))))
# reorder lake by relative elevation
mod3.lake$Lake<-as.factor(reorder(mod3.lake$Lake, mod3.lake$Elevation))
### make the plot
mod3.lake.fig<-ggplot(data=mod3.lake, aes(x=Lake, y=Mean, fill=Source))+
facet_wrap(.~phy.group)+
scale_fill_manual(values=c("dodgerblue", "springgreen3"))+
geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD, width=.2),
position=position_dodge(0.5))+
geom_point(size=4, position=position_dodge(0.5), color="black", pch=21)+
ylab("Proportion of EAA-source") +
xlab("Lakes: low-to-high elevation") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
mod3.lake.fig
### export it
pdf(file= "figures/FigS7.Lake.mixsiar.pdf", height=7, width=12)
mod3.lake.fig
dev.off()
Figure. Proportional contributions of two AAESS sources (POM and terrestrial plants) to zooplankton consumers (left) pooled across sites and (right) at each site, estimated from a MixSIAR model that included group as fixed effect and location as random effect.
Combine all data
Now that we have more data, let’s combine it with the previous data so
that we have a unified dataframe with the sampling information, measured
AAESS data, LD-classification, LD1 coordinate, and mixing
model output.
#### by sample
#check.names=FALSE prevents an "X" being added to the columns that start with a number
samp.mix.df<-read.csv("output/MixSIAR_Samp.df.csv", check.names=FALSE)
#### merge data from mixsiar with other data
# read in the EAA values for zoops (loaded above, "output/SNL_zooplankton_ess_AA_data.csv", and reorganized in "zoop.sub")
# combine zoops and producers, called in from 'ESS.df.long'
head(ESS.df.long)
MSiar.LD.meta <- merge.data.frame(samp.mix.df, LDA.df, by.x = "SampleID", by.y = "SampleID", all=TRUE)
MSiar.LD.meta.EAA <- merge.data.frame(MSiar.LD.meta, ESS.df.long,
by.x = "SampleID", by.y = "SampleID", all=TRUE)
#### simplify data
# select useful columns
MSiar.LD.meta.EAA<- MSiar.LD.meta.EAA %>%
dplyr::select("SampleID", "Source", "Mean", "SD", "Lake.x",
"orig.class", "phy.group.y", "LD.class", "LD1", "Group",
"Ile", "Leu", "Phe", "Thr", "Val")
MSiar.LD.meta.EAA<- MSiar.LD.meta.EAA %>%
dplyr::rename("Lake" = "Lake.x", "phy.group" = "phy.group.y")
# read in environmental data (loaded above, <-read.csv("data/SNL.envdata.csv"), "SNL.env"
MSiar.data.env<-merge.data.frame(MSiar.LD.meta.EAA, SNL.env, by.x = "Lake", by.y = "Lake",all=TRUE)
# select useful columns
MSiar.data.env<- MSiar.data.env %>%
dplyr::select("Location", "Date", "Lake", "elevation..m", "latitude", "longitude", "depth..m",
"secchi..m", "temp..C", "DO..perc", "cond", "pH", "DOC..mg.L",
"TN..ug.L", "TP..ug.L", "chla..ug.L", "SampleID", "orig.class", "phy.group",
"Ile", "Leu", "Phe", "Thr", "Val",
"LD.class", "LD1", "LD.class", "Source", "Mean", "SD")
# (can't subset data easily above with the NAs for 'Lake Source'
# separate the 3 types of data -- a bit messy because of column "mean", which is the proportion of source 1 vs 2
MSiar.data.env.plant.prop<-na.omit(MSiar.data.env[(MSiar.data.env$Source=="Plants"),])
MSiar.data.env.POM.prop<-na.omit(MSiar.data.env[(MSiar.data.env$Source=="POM"),])
MSiar.data.env.Source<-MSiar.data.env[(MSiar.data.env$LD.class=="Source"),]
# rename PLANT prop mean
MSiar.data.env.plant.prop<- MSiar.data.env.plant.prop %>%
dplyr::rename("prop.plant.mean" = "Mean")
MSiar.data.env.plant.prop<- MSiar.data.env.plant.prop %>%
dplyr::rename("prop.plant.SD" = "SD")
# rename POM prop mean
MSiar.data.env.POM.prop<- MSiar.data.env.POM.prop %>%
dplyr::rename("prop.POM.mean" = "Mean")
MSiar.data.env.POM.prop<- MSiar.data.env.POM.prop %>%
dplyr::rename("prop.POM.SD" = "SD")
#### add the Sources back in
# make dumpy NA columns so the headers match
MSiar.data.env.Source1<- MSiar.data.env.Source %>%
dplyr::rename("prop.plant.mean" = "Mean")
MSiar.data.env.Source1<- MSiar.data.env.Source1 %>%
dplyr::rename("prop.plant.SD" = "SD")
# merge data, has plants and sources
Pooled1.plant.df <- rbind(MSiar.data.env.Source1, MSiar.data.env.plant.prop)
### do the same for POM to bring in the sources
# add the Sources back in
MSiar.data.env.Source2<- MSiar.data.env.Source %>%
dplyr::rename("prop.POM.mean" = "Mean")
MSiar.data.env.Source2<- MSiar.data.env.Source2 %>%
dplyr::rename("prop.POM.SD" = "SD")
# merge data, has POM and sources
Pooled2.POM.df <- rbind(MSiar.data.env.Source2, MSiar.data.env.POM.prop)
#reduce columns needed for merged data
Pooled2.POM.select.df<- Pooled2.POM.df %>% dplyr::select (SampleID, prop.POM.mean, prop.POM.SD)
# merge data, this is all the data, merged by ID, with the new columns for prop POM or plant
final.df <- merge.data.frame(Pooled1.plant.df, Pooled2.POM.select.df, by.x = "SampleID", by.y = "SampleID")
#remove Source column, which is relic of output from MixSIAR
final.df<- final.df %>% dplyr::select (-Source)
#### export full data for project
write.csv(final.df, "output/SNL.CSIA.fulldata.csv")
Use the proportion of AAESS from terrestrial plants from the mixing model with a multiple linear regressions analysis to find the environmental conditions that correlate with each other and exhibit a relationship with changes in allochthony for each zooplanton category.
# use 'ESS.mods' to run models
# look at % plant contributions by environmental data
mod.data<- final.df %>%
dplyr::select(SampleID, Lake, elevation..m, depth..m, temp..C, DO..perc, pH, DOC..mg.L,
TN..ug.L, TP..ug.L, chla..ug.L, phy.group, Ile, Leu, Phe, Thr, Val, LD.class,
LD1, prop.plant.mean, prop.plant.SD)
# remove data where elevation for samples is missing (ie the source samples that were pooled across all sampling sites)
Plant.nut.env<-mod.data[!(mod.data$Lake=="Source"),]
############# ############# ############# #############
############# look at correlation matrix, what variables strongly correlate?
# Install and load the ggcorrplot package
# Keep predictors for the model
reduced_data <- Plant.nut.env %>%
dplyr::select(elevation..m, temp..C, pH,
TN..ug.L, TP..ug.L, chla..ug.L, DOC..mg.L)
# Compute correlation at 2 decimal places
corr_matrix = round(cor(reduced_data), 2)
# Compute and show the result
ggcorrplot(corr_matrix, hc.order = TRUE, type = "lower",
lab = TRUE)
Figure Correlation matrix displaying Pearson’s coefficient (r) between environmental and biogeochemical metrics at eight aquatic habitats in the Eastern Sierras Nevada Mountains. Positive and negative coefficients indicate positive and negative correlations between metrics, respectively.
### export it
dev.copy(pdf, "figures/FigS2.correl.matrix.pdf", height=6, width=8)
dev.off()
# from here, TN, TP, DOC all correlate with >0.77 (TN and DOC at 0.94!) So only need 1 of these in a multiple linear regression
Multiple regressions for ALL zooplankton = >350 um, cladocerans, copepods
# library(MASS)
# run a full model
mod.mult<-lm(prop.plant.mean~temp..C+ pH+TN..ug.L+chla..ug.L,
data=Plant.nut.env)
# model fit
summary(mod.mult)
# check distribution and residuals
hist(mod.mult$residuals)
qqnorm(mod.mult$residuals)
qqline(mod.mult$residuals)
sigma(mod.mult)/mean(Plant.nut.env$prop.plant.mean) #70% error rate, not awesome.
# run model selection
all.zoop.mods<-stepAIC(mod.mult, trace=TRUE)
# best model is with chla (p=0.111) and TN (p=0.009), R2-adjust = 0.186, RSE= 0.099
mod.mult.all<-lm(prop.plant.mean~TN..ug.L+chla..ug.L,
data=Plant.nut.env)
summary(mod.mult.all)
Anova(mod.mult.all, type=2)
Anova(mod.mult.all, type=2)
## Anova Table (Type II tests)
##
## Response: prop.plant.mean
## Sum Sq Df F value Pr(>F)
## TN..ug.L 0.078519 1 8.0487 0.0089 **
## chla..ug.L 0.026679 1 2.7348 0.1107
## Residuals 0.243887 25
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
### if doing single models, the outcome is the same -- trend for chla and signifiance for DOC, TN, TP (which all correlate)
# TN, all groups
N.prop.plant<-ggplot(Plant.nut.env, aes(x=TN..ug.L, y=prop.plant.mean)) +
geom_smooth(method = lm, color="purple", alpha=0.2, fill="purple")+
geom_point(color="violet", size=2)+
ylab("Proportion Plant-EAA")+
xlab("Total N (ug/L)")+
ggtitle("All zoops")+
theme_classic()
# Chloropyll, all groups
chla.prop.plant<-ggplot(Plant.nut.env, aes(x=chla..ug.L, y=prop.plant.mean)) +
geom_smooth(method = lm, color="purple", alpha=0.2, fill="purple")+
geom_point(color="violet", size=2)+
ylab("Proportion Plant-EAA")+
xlab("chlorophyll a (ug/L)")+
ggtitle("All zoops")+
theme_classic()
Multiple regressions Cladocera alone
#############################
##### Cladocerans
cladoc.plant<-Plant.nut.env[(Plant.nut.env$phy.group=="cladocera"),]
# run a full model, but 3 not defined because of singularities
mod.mult.cladoc<-lm(prop.plant.mean~temp..C+ pH+TN..ug.L+chla..ug.L,
data=cladoc.plant)
# model fit
summary(mod.mult.cladoc)
# check distribution and residuals
hist(mod.mult.cladoc$residuals)
qqnorm(mod.mult.cladoc$residuals)
qqline(mod.mult.cladoc$residuals)
sigma(mod.mult.cladoc)/mean(cladoc.plant$prop.plant.mean) #71% error rate
# run model selection
cladoc.mods<-stepAIC(mod.mult.cladoc, trace=TRUE)
# best model is with pH (p=0.081), Multiple R2 = 0.333, RSE= 0.092
mod.mult.cladoc.reduced<-lm(prop.plant.mean~pH, data=cladoc.plant)
summary(mod.mult.cladoc.reduced)
Anova(mod.mult.cladoc.reduced, type=2)
Anova(mod.mult.cladoc.reduced, type=2)
## Anova Table (Type II tests)
##
## Response: prop.plant.mean
## Sum Sq Df F value Pr(>F)
## pH 0.033736 1 3.9863 0.08094 .
## Residuals 0.067704 8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# this is best model for mutiple regression, RSE 0.104, R2-adjust =0.176
mod.mult<-lm(prop.plant.mean~TN..ug.L+chla..ug.L,
data=cladoc.plant)
summary(mod.mult)
sigma(mod.mult)/mean(Plant.nut.env$prop.plant.mean) #62% error rate
# Cladocera, near signif trend for pH (lower plant nutrition with increase in pH)
clad.pH.plant<-ggplot(cladoc.plant, aes(x=pH, y=prop.plant.mean)) +
geom_smooth(method = lm, color="goldenrod4", alpha=0.2, fill="goldenrod4")+
geom_point(color="goldenrod2", size=2)+
ylab("Proportion Plant-EAA")+
xlab("pH")+
ggtitle("Cladocera")+
theme_classic()
Multiple regressions Copepods alone
### separate dataframes
copep.plant<-Plant.nut.env[(Plant.nut.env$phy.group=="copepoda"),]
#############################
##### Copepods
# run a full model, but 3 not defined because of singularities
mod.mult.copepod<-lm(prop.plant.mean~temp..C+pH+TN..ug.L+chla..ug.L,
data=copep.plant)
# model fit
summary(mod.mult.copepod)
# check distribution and residuals
hist(mod.mult.copepod$residuals)
qqnorm(mod.mult.copepod$residuals)
qqline(mod.mult.copepod$residuals)
sigma(mod.mult.copepod)/mean(copep.plant$prop.plant.mean) #49% error rate.
# run model selection
copep.mods<-stepAIC(mod.mult.copepod, trace=TRUE)
# best model is with TN (p=0.041) and temp (0.229), R2-adjust = 0.328, RSE= 0.096
mod.mult.copepod<-lm(prop.plant.mean~TN..ug.L+temp..C,
data=copep.plant)
summary(mod.mult.copepod)
Anova(mod.mult.copepod, type=2)
Anova(mod.mult.copepod, type=2)
## Anova Table (Type II tests)
##
## Response: prop.plant.mean
## Sum Sq Df F value Pr(>F)
## TN..ug.L 0.057533 1 6.2492 0.041 *
## temp..C 0.015990 1 1.7368 0.229
## Residuals 0.064446 7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# copepods, N significant
copepod.TN<-ggplot(copep.plant, aes(x=TN..ug.L, y=prop.plant.mean)) +
geom_smooth(method = lm, color="darkred", alpha=0.2, fill="darkred")+
geom_point(color="coral", size=2)+
ylab("Proportion Plant-EAA")+
xlab("Total N (ug/L)")+
ggtitle("Copepoda")+
theme_classic()
# copepods, Temp barely a trend, nothing clear
copepod.Temp<-ggplot(copep.plant, aes(x=temp..C, y=prop.plant.mean)) +
geom_smooth(method = lm, color="darkred", alpha=0.2, fill="darkred")+
geom_point(color="coral", size=2)+
ylab("Proportion Plant-EAA")+
xlab("Temperature (C)")+
ggtitle("Copepoda")+
theme_classic()
Plots for multiple regression output
multregres.plots<-plot_grid(N.prop.plant, clad.pH.plant, copepod.TN,
ncol=3, nrow=1)
multregres.plots
Figure. Best-fit terms applied to multiple linear regression models testing relationships between environmental metrics and the proportional contribution of AAESS from terrestrial plants to zooplankton
dev.copy(pdf, "figures/Fig4.Multregrss.pdf", height=3.5, width=9)
dev.off()